AITopics | Yantai

Collaborating Authors

Yantai

The Download: Making AI Work, and why the Moltbook hype is similar to Pokémon

MIT Technology ReviewFeb-10-2026, 13:10:00 GMT

Are you interested in learning more about the ways in which AI is being used? We've launched a new weekly newsletter series exploring just that: digging into how generative AI is being used and deployed across sectors and what professionals need to know to apply it in their everyday work. Each edition of Making AI Work begins with a case study, examining a specific use case of AI in a given industry. Then we'll take a deeper look at the AI tool being used, with more context about how other companies or sectors are employing that same tool or system. Finally, we'll end with action-oriented tips to help you apply the tool. The first edition takes a look at how AI is changing health care, digging into the future of medical note-taking by learning about the Microsoft Copilot tool used by doctors at Vanderbilt University Medical Center.

large language model, machine learning, natural language, (20 more...)

MIT Technology Review

Country:

North America > United States > New York (0.05)
North America > United States > Massachusetts (0.05)
Europe > Iceland (0.05)
(2 more...)

Industry:

Health & Medicine > Health Care Providers & Services (0.69)
Leisure & Entertainment > Games > Computer Games (0.42)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.36)

Add feedback

SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion

Zhang, Xiaoyang, Li, jinjiang, Fan, Guodong, Ju, Yakun, Fan, Linwei, Liu, Jun, Kot, Alex C.

arXiv.org Artificial IntelligenceNov-25-2025

Infrared and visible image fusion (IVIF) aims to combine the thermal radiation information from infrared images with the rich texture details from visible images to enhance perceptual capabilities for downstream visual tasks. However, existing methods often fail to preserve key targets due to a lack of deep semantic understanding of the scene, while the fusion process itself can also introduce artifacts and detail loss, severely compromising both image quality and task performance. To address these issues, this paper proposes SGDFuse, a conditional diffusion model guided by the Segment Anything Model (SAM), to achieve high-fidelity and semantically-aware image fusion. The core of our method is to utilize high-quality semantic masks generated by SAM as explicit priors to guide the optimization of the fusion process via a conditional diffusion model. Specifically, the framework operates in a two-stage process: it first performs a preliminary fusion of multi-modal features, and then utilizes the semantic masks from SAM jointly with the preliminary fused image as a condition to drive the diffusion model's coarse-to-fine denoising generation. This ensures the fusion process not only has explicit semantic directionality but also guarantees the high fidelity of the final result. Extensive experiments demonstrate that SGDFuse achieves state-of-the-art performance in both subjective and objective evaluations, as well as in its adaptability to downstream tasks, providing a powerful solution to the core challenges in image fusion. The code of SGDFuse is available at https://github.com/boshizhang123/SGDFuse.

artificial intelligence, image understanding, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.05264

Country:

Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
Asia > China > Shandong Province > Yantai (0.04)
Asia > China > Fujian Province > Fuzhou (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Survey of Vision-Language-Action Models for Embodied Manipulation

Li, Haoran, Chen, Yuhui, Cui, Wenbo, Liu, Weiheng, Liu, Kai, Zhou, Mingcai, Zhang, Zhengtao, Zhao, Dongbin

arXiv.org Artificial IntelligenceNov-13-2025

Embodied intelligence systems, which enhance agent capabilities through continuous environment interactions, have garnered significant attention from both academia and industry. Vision-Language-Action models, inspired by advancements in large foundation models, serve as universal robotic control frameworks that substantially improve agent-environment interaction capabilities in embodied intelligence systems. This expansion has broadened application scenarios for embodied AI robots. This survey comprehensively reviews VLA models for embodied manipulation. Firstly, it chronicles the developmental trajectory of VLA architectures. Subsequently, we conduct a detailed analysis of current research across 5 critical dimensions: VLA model structures, training datasets, pre-training methods, post-training methods, and model evaluation. Finally, we synthesize key challenges in VLA development and real-world deployment, while outlining promising future research directions.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.15201

Country:

Europe > Austria > Vienna (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China > Beijing > Beijing (0.06)
(30 more...)

Genre:

Research Report (0.50)
Overview (0.34)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(2 more...)

Add feedback

Potential Indicator for Continuous Emotion Arousal by Dynamic Neural Synchrony

Pan, Guandong, Wu, Zhaobang, Yang, Yaqian, Wang, Xin, Liu, Longzhao, Zheng, Zhiming, Tang, Shaoting

arXiv.org Artificial IntelligenceSep-19-2025

The need for automatic and high-quality emotion annotation is paramount in applications such as continuous emotion rec ognition and video highlight detection, yet achieving this through manu al human annotations is challenging. Inspired by inter-subject corre lation (ISC) utilized in neuroscience, this study introduces a novel Electr oencephalog-raphy (EEG) based ISC methodology that leverages a single-e lectrode and feature-based dynamic approach. Our contributions are three folds: Firstly, we reidentify two potent emotion features suitabl e for classifying emotions--first-order difference (FD) an differential entrop y (DE). Secondly, through the use of overall correlation analysis, we d emonstrate the heterogeneous synchronized performance of electrodes. Th is performance aligns with neural emotion patterns established in prior st udies, thus validating the effectiveness of our approach. Thirdly, by emplo ying a sliding window correlation technique, we showcase the significant c onsistency of dynamic ISCs across various features or key electrodes in ea ch analyzed film clip. Our findings indicate the method's reliability in c apturing consistent, dynamic shared neural synchrony among individual s, triggered by evocative film stimuli. This underscores the potential of our approach to serve as an indicator of continuous human emotion arousal . The implications of this research are significant for advancement s in affective computing and the broader neuroscience field, suggesting a s treamlined and effective tool for emotion analysis in real-world applic ations. 2 G. Pan et al.

artificial intelligence, correlation, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-981-96-4001-0_6

2504.03643

Country:

Asia > China > Beijing > Beijing (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > China > Shandong Province > Yantai (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.87)

Add feedback

Contrastive Analysis of Constituent Order Preferences Within Adverbial Roles in English and Chinese News: A Large-Language-Model-Driven Approach

Ma, Yiran Rex

arXiv.org Artificial IntelligenceAug-21-2025

Based on comparable English-Chinese news corpora annotated by Large Language Model (LLM), this paper attempts to explore the differences in constituent order of English-Chinese news from the perspective of functional chunks with adverbial roles, and analyze their typical positional preferences and distribution patterns. It is found that: (1) English news prefers linear narrative of core information first, and functional chunks are mostly post-positioned, while Chinese news prefers overall presentation mode of background first, and functional chunks are often pre-positioned; (2) In SVO structure, both English and Chinese news show differences in the distribution of functional chunks, but the tendency of Chinese pre-positioning is more significant, while that of English post-positioning is relatively mild; (3) When function blocks are co-occurring, both English and Chinese news show high flexibility, and the order adjustment is driven by information and pragmatic purposes. The study reveals that word order has both systematic preference and dynamic adaptability, providing new empirical support for contrastive study of English-Chinese information structure.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2508.14054

Country:

Asia > China > Beijing > Beijing (0.41)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(8 more...)

Genre: Research Report > Experimental Study (0.68)

Industry:

Government (0.93)
Media > News (0.67)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

A Survey of Context Engineering for Large Language Models

Mei, Lingrui, Yao, Jiayu, Ge, Yuyao, Wang, Yiwei, Bi, Baolong, Cai, Yujun, Liu, Jiazhi, Li, Mingyu, Li, Zhong-Zhi, Zhang, Duzhen, Zhou, Chenlin, Mao, Jiayi, Xia, Tianze, Guo, Jiafeng, Liu, Shenghua

arXiv.org Artificial IntelligenceJul-22-2025

The performance of Large Language Models (LLMs) is fundamentally determined by the contextual information provided during inference. This survey introduces Context Engineering, a formal discipline that transcends simple prompt design to encompass the systematic optimization of information payloads for LLMs. We present a comprehensive taxonomy decomposing Context Engineering into its foundational components and the sophisticated implementations that integrate them into intelligent systems. We first examine the foundational components: context retrieval and generation, context processing and context management. We then explore how these components are architecturally integrated to create sophisticated system implementations: retrieval-augmented generation (RAG), memory systems and tool-integrated reasoning, and multi-agent systems. Through this systematic analysis of over 1400 research papers, our survey not only establishes a technical roadmap for the field but also reveals a critical research gap: a fundamental asymmetry exists between model capabilities. While current models, augmented by advanced context engineering, demonstrate remarkable proficiency in understanding complex contexts, they exhibit pronounced limitations in generating equally sophisticated, long-form outputs. Addressing this gap is a defining priority for future research. Ultimately, this survey provides a unified framework for both researchers and engineers advancing context-aware AI.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.13334

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(8 more...)

Genre: Overview (1.00)

Industry:

Health & Medicine (1.00)
Education (0.92)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Continuous Multi-Task Pre-training for Malicious URL Detection and Webpage Classification

Li, Yujie, Liu, Yiwei, Li, Peiyue, Jia, Yifan, Wang, Yanbin

arXiv.org Artificial IntelligenceMay-27-2025

Malicious URL detection and webpage classification are critical tasks in cybersecurity and information management. In recent years, extensive research has explored using BERT or similar language models to replace traditional machine learning methods for detecting malicious URLs and classifying webpages. While previous studies show promising results, they often apply existing language models to these tasks without accounting for the inherent differences in domain data (e.g., URLs being loosely structured and semantically sparse compared to text), leaving room for performance improvement. Furthermore, current approaches focus on single tasks and have not been tested in multi-task scenarios. To address these challenges, we propose urlBERT, a pre-trained URL encoder leveraging Transformer to encode foundational knowledge from billions of unlabeled URLs. To achieve it, we propose to use 5 unsupervised pretraining tasks to capture multi-level information of URL lexical, syntax, and semantics, and generate contrastive and adversarial representations. Furthermore, to avoid inter-pre-training competition and interference, we proposed a grouped sequential learning method to ensure effective training across multi-tasks. Finally, we leverage a two-stage fine-tuning approach to improve the training stability and efficiency of the task model. To assess the multitasking potential of urlBERT, we fine-tune the task model in both single-task and multi-task modes. The former creates a classification model for a single task, while the latter builds a classification model capable of handling multiple tasks. We evaluate urlBERT on three downstream tasks: phishing URL detection, advertising URL detection, and webpage classification. The results demonstrate that urlBERT outperforms standard pre-trained models, and its multi-task mode is capable of addressing the real-world demands of multitasking.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.11495

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Shandong Province > Yantai (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
(4 more...)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

PAPN: Proximity Attention Encoder and Pointer Network Decoder for Parcel Pickup Route Prediction

Denis, Hansi, Mercelis, Siegfried, Luong, Ngoc-Quang

arXiv.org Artificial IntelligenceMay-8-2025

Optimization of the last-mile delivery and first-mile pickup of parcels is an integral part of the broader logistics optimization pipeline as it entails both cost and resource efficiency as well as a heightened service quality. Such optimization requires accurate route and time prediction systems to adapt to different scenarios in advance. This work tackles the first building block, namely route prediction. This is done by introducing a novel Proximity Attention mechanism in an encoder-decoder architecture utilizing a Pointer Network in the decoding process (Proximity Attention Encoder and Pointer Network decoder: PAPN) to leverage the underlying connections between the different visitable pickup positions at each timestep. To this local attention process is coupled global context computing via a multi-head attention transformer encoder. The obtained global context is then mixed to an aggregated version of the local embedding thus achieving a mix of global and local attention for complete modeling of the problems. Proximity attention is also used in the decoding process to skew predictions towards the locations with the highest attention scores and thus using inter-connectivity of locations as a base for next-location prediction. This method is trained, validated and tested on a large industry-level dataset of real-world, large-scale last-mile delivery and first-mile pickup named LaDE[1]. This approach shows noticeable promise, outperforming all state-of-the-art supervised systems in terms of most metrics used for benchmarking methods on this dataset while still being competitive with the best-performing reinforcement learning method named DRL4Route[2].

machine learning, prediction, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2505.03776

Country:

North America > United States > Texas > El Paso County > El Paso (0.05)
Asia > China > Shandong Province > Yantai (0.05)
Europe > Belgium > Flanders > Antwerp Province > Antwerp (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Industry: Transportation (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models

Cheng, Xianfu, Zhang, Wei, Zhang, Shiwei, Yang, Jian, Guan, Xiangyuan, Wu, Xianjie, Li, Xiang, Zhang, Ge, Liu, Jiaheng, Mai, Yuying, Zeng, Yutao, Wen, Zhoufutu, Jin, Ke, Wang, Baorui, Zhou, Weixiao, Lu, Yunhong, Li, Tongliang, Huang, Wenhao, Li, Zhoujun

arXiv.org Artificial IntelligenceFeb-18-2025

The increasing application of multi-modal large language models (MLLMs) across various sectors have spotlighted the essence of their output reliability and accuracy, particularly their ability to produce content grounded in factual information (e.g. common and domain-specific knowledge). In this work, we introduce SimpleVQA, the first comprehensive multi-modal benchmark to evaluate the factuality ability of MLLMs to answer natural language short questions. SimpleVQA is characterized by six key features: it covers multiple tasks and multiple scenarios, ensures high quality and challenging queries, maintains static and timeless reference answers, and is straightforward to evaluate. Our approach involves categorizing visual question-answering items into 9 different tasks around objective events or common knowledge and situating these within 9 topics. Rigorous quality control processes are implemented to guarantee high-quality, concise, and clear answers, facilitating evaluation with minimal variance via an LLM-as-a-judge scoring system. Using SimpleVQA, we perform a comprehensive assessment of leading 18 MLLMs and 8 text-only LLMs, delving into their image comprehension and text generation abilities by identifying and analyzing error cases.

large language model, machine learning, simplevqa, (19 more...)

arXiv.org Artificial Intelligence

2502.13059

Country:

North America > United States (0.28)
Europe > Italy > Sardinia (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(4 more...)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-Order Hyperbolic Graph Convolution and Aggregated Attention for Social Event Detection

Liu, Yao, Liu, Zhilan, Tan, Tien Ping, Li, Yuxin

arXiv.org Artificial IntelligenceFeb-10-2025

Social event detection (SED) is a task focused on identifying specific real-world events and has broad applications across various domains. It is integral to many mobile applications with social features, including major platforms like Twitter, Weibo, and Facebook. By enabling the analysis of social events, SED provides valuable insights for businesses to understand consumer preferences and supports public services in handling emergencies and disaster management. Due to the hierarchical structure of event detection data, traditional approaches in Euclidean space often fall short in capturing the complexity of such relationships. While existing methods in both Euclidean and hyperbolic spaces have shown promising results, they tend to overlook multi-order relationships between events. To address these limitations, this paper introduces a novel framework, Multi-Order Hyperbolic Graph Convolution with Aggregated Attention (MOHGCAA), designed to enhance the performance of SED. Experimental results demonstrate significant improvements under both supervised and unsupervised settings. To further validate the effectiveness and robustness of the proposed framework, we conducted extensive evaluations across multiple datasets, confirming its superiority in tackling common challenges in social event detection.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.00351

Country:

Asia > China > Sichuan Province > Chengdu (0.05)
Asia > Malaysia > Penang (0.04)
Europe > United Kingdom > England > Surrey (0.04)
(4 more...)

Genre: Research Report > New Finding (0.88)

Industry: Leisure & Entertainment > Social Events (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)

Add feedback